skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Roberts, Nicholas"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Weak supervision (WS) is a popular approach for label-efficient learning, leveraging diverse sources of noisy but inexpensive weak labels to automatically annotate training data. Despite its wide usage, WS and its practical value are challenging to benchmark due to the many knobs in its setup, including: data sources, labeling functions (LFs), aggregation techniques (called label models), and end model pipelines. Existing evaluation suites tend to be limited, focusing on particular components or specialized use cases. Moreover, they often involve simplistic benchmark tasks or de-facto LF sets that are suboptimally written, producing insights that may not generalize to real-world settings. We address these limitations by introducing a new benchmark, BOXWRENCH, designed to more accurately reflect real-world usages of WS. This benchmark features tasks with (1) higher class cardinality and imbalance, (2) notable domain expertise requirements, and (3) opportunities to re-use LFs across parallel multilingual corpora. For all tasks, LFs are written using a careful procedure aimed at mimicking real-world settings. In contrast to existing WS benchmarks, we show that supervised learning requires substantial amounts (1000+) of labeled examples to match WS in many settings. 
    more » « less
    Free, publicly-accessible full text available December 10, 2025
  2. Machine learning models—including prominent zero-shot models—are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes—or, in the case of zero-shot prediction, to improve its performance—without any additional training. Our technique is a drop-in replacement of the standard prediction rule, swapping arg max with the Fréchet mean. We provide a comprehensive theoretical analysis for this approach, studying (i) learning-theoretic results trading off label space diameter, sample complexity, and model dimension, (ii) characterizations of the full range of scenarios in which it is possible to predict any unobserved class, and (iii) an optimal active learning-like next class selection procedure to obtain optimal training classes for when it is not possible to predict the entire range of unobserved classes. Empirically, using easily-available external metrics, our proposed approach, LOKI, gains up to 29.7% relative improvement over SimCLR on ImageNet and scales to hundreds of thousands of classes. When no such metric is available, LOKI can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP. 
    more » « less
  3. Abstract Among promising applications of metal‐halide perovskite, the most research progress is made for perovskite solar cells (PSCs). Data from myriads of research work enables leveraging machine learning (ML) to significantly expedite material and device optimization as well as potentially design novel configurations. This paper represents one of the first efforts in providing open‐source ML tools developed utilizing the Perovskite Database Project (PDP), the most comprehensive open‐source PSC database to date with over 43 000 entries from published literature. Three ML model architectures with short‐circuit current density (Jsc) as a target are trained exploiting the PDP. Using the XGBoost architecture, a root mean squared error (RMSE) of 3.58 , R2of 0.35 and a mean absolute percentage error (MAPE) of 9.49% are achieved. This performance is comparable to results reported in literature, and through further investigation can likely be improved. To overcome challenges with manual database creation, an open‐source data cleaning pipeline is created for PDP data. Through the creation of these tools, which have been published on GitHub, this research aims to make ML available to aid the design for PSC while showing the already promising performance achieved. The tools can be adapted for other applications, such as perovskite light‐emitting diodes (PeLEDs), if a sufficient database is available. 
    more » « less
  4. Weak supervision (WS) frameworks are a popular way to bypass hand-labeling large datasets for training data-hungry models. These approaches synthesize multiple noisy but cheaply-acquired estimates of labels into a set of high-quality pseudo-labels for downstream training. However, the synthesis technique is specific to a particular kind of label, such as binary labels or sequences, and each new label type requires manually designing a new synthesis algorithm. Instead, we propose a universal technique that enables weak supervision over any label type while still offering desirable properties, including practical flexibility, computational efficiency, and theoretical guarantees. We apply this technique to important problems previously not tackled by WS frameworks including learning to rank, regression, and learning in hyperbolic space. Theoretically, our synthesis approach produces a consistent estimators for learning some challenging but important generalizations of the exponential family model. Experimentally, we validate our framework and show improvement over baselines in diverse settings including real-world learning-to-rank and regression problems along with learning on hyperbolic manifolds. 
    more » « less
  5. Most existing neural architecture search (NAS) benchmarks and algorithms prioritize well-studied tasks, eg image classification on CIFAR or ImageNet. This makes the performance of NAS approaches in more diverse areas poorly understood. In this paper, we present NAS-Bench-360, a benchmark suite to evaluate methods on domains beyond those traditionally studied in architecture search, and use it to address the following question: do state-of-the-art NAS methods perform well on diverse tasks? To construct the benchmark, we curate ten tasks spanning a diverse array of application domains, dataset sizes, problem dimensionalities, and learning objectives. Each task is carefully chosen to interoperate with modern CNN-based search methods while possibly being far-afield from its original development domain. To speed up and reduce the cost of NAS research, for two of the tasks we release the precomputed performance of 15,625 architectures comprising a standard CNN search space. Experimentally, we show the need for more robust NAS evaluation of the kind NAS-Bench-360 enables by showing that several modern NAS procedures perform inconsistently across the ten tasks, with many catastrophically poor results. We also demonstrate how NAS-Bench-360 and its associated precomputed results will enable future scientific discoveries by testing whether several recent hypotheses promoted in the NAS literature hold on diverse tasks. NAS-Bench-360 is hosted at https://nb360. ml. cmu. edu. 
    more » « less
  6. null (Ed.)
    The Paleoarchean East Pilbara Terrane of Western Australia is a dome-and-keel terrane that is often highlighted as recording a vertically convective tectonic regime in the early Earth. In this model, termed ’partial convective overturn’, granitic domes diapirically rose through a dense, foundering mafic supracrustal sequence. The applicability of partial convective overturn to the East Pilbara Terrane and to other Archean dome-and-keel terranes is widely debated and has significant implications for early Earth geodynamics. A critical data gap in the East Pilbara Terrane is the internal structure of the granitic domes. We present field-based, microstructural, and anisotropy of magnetic susceptibility (AMS) data collected within the Mt Edgar dome to understand its internal structure and assess its compatibility with existing dome formation models. Field and microstructural observations suggest that most fabric development occurred under submagmatic and high-temperature solid- state conditions. The AMS results reveal a coherent, dome-wide structural pattern: 1) Sub-vertical lineations plunge radially inward towards the center of the dome and foliations across much of the dome consistently strike northwest; 2) Shallowly plunging lineations define an arch that extends from the center of the dome to the southwest margin; and 3) Migmatitic gneisses, which represent the oldest granitic component of the dome, are folded and flattened against the margin of the dome in two distinct lobes. The structural relationships between rocks of different ages indicate that units of different crystallization ages deformed synchronously during the last major pulse of granitic magmatism. These data are broadly consistent with a vertical tectonics model, and we synthesize our structural results to propose a three-stage diapiric evolution of the Mt Edgar dome. The critical stage of dome development was between 3.3 and 3.2 Ga, when widespread, melt-assisted flow of the deep crust led to the formation of a steep-walled, composite dome. These data suggest that diapiric processes were important for the formation of dome-and-keel terranes in the Paleoarchean. 
    more » « less
  7. Weak supervision (WS) is a powerful method to build labeled datasets for training supervised models in the face of little-to-no labeled data. It replaces hand-labeling data with aggregating multiple noisy-but-cheap label estimates expressed by labeling functions (LFs). While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of constructing labeling functions for domains with complex or high-dimensional features. To address this, a handful of methods have proposed automating the LF design process using a small set of ground truth labels. In this work, we introduce AutoWS-Bench-101: a framework for evaluating automated WS (AutoWS) techniques in challenging WS settings -- a set of diverse application domains on which it has been previously difficult or impossible to apply traditional WS techniques. While AutoWS is a promising direction toward expanding the application-scope of WS, the emergence of powerful methods such as zero-shot foundation models reveals the need to understand how AutoWS techniques compare or cooperate with modern zero-shot or few-shot learners. This informs the central question of AutoWS-Bench-101: given an initial set of 100 labels for each task, we ask whether a practitioner should use an AutoWS method to generate additional labels or use some simpler baseline, such as zero-shot predictions from a foundation model or supervised learning. We observe that in many settings, it is necessary for AutoWS methods to incorporate signal from foundation models if they are to outperform simple few-shot baselines, and AutoWS-Bench-101 promotes … 
    more » « less